Skip to content

Retry Handler Support when call activity or suborchestration#96

Merged
torosent merged 22 commits intomainfrom
wangbill/retryhandler
Feb 7, 2026
Merged

Retry Handler Support when call activity or suborchestration#96
torosent merged 22 commits intomainfrom
wangbill/retryhandler

Conversation

@YunchuWang
Copy link
Copy Markdown
Member

@YunchuWang YunchuWang commented Feb 6, 2026

Summary

What changed?

New Features: Retry Handler Support

This PR adds support for imperative retry control through AsyncRetryHandler and RetryHandler functions, allowing users to make custom retry decisions based on failure context. This mirrors the .NET SDK's InvokeWithCustomRetryHandler pattern.

New Types & APIs:

  • AsyncRetryHandler - Async function delegate for imperative retry decisions
  • RetryHandler - Sync function delegate for imperative retry decisions
  • RetryContext - Context object passed to handlers with failure details, attempt count, elapsed time
  • toAsyncRetryHandler() - Utility to normalize sync handlers to async
  • taskOptionsFromRetryHandler() - Creates TaskOptions from an AsyncRetryHandler
  • taskOptionsFromSyncRetryHandler() - Creates TaskOptions from a sync RetryHandler
  • TaskRetryOptions - Union type supporting RetryPolicy | AsyncRetryHandler | RetryHandler

Enhanced RetryPolicy:

  • Added handleFailure predicate option for filtering retryable failures by error type/message
  • Added retryTimeoutInMilliseconds option for overall retry timeout
  • Added shouldRetry() method to evaluate failures against the predicate

New Internal Classes:

  • RetryTaskBase<T> - Abstract base class for retry tasks, shared between policy and handler approaches
  • RetryHandlerTask<T> - Task implementation for handler-based retry (immediate reschedule, no timer delay)

Orchestration Context Changes:

  • callActivity() and callSubOrchestrator() now accept TaskOptions.retry as RetryPolicy, AsyncRetryHandler, or RetryHandler
  • Added rescheduleRetryTask() for immediate retry rescheduling (used by handler-based retry)

Bug Fixes

  • Fixed error type detection to use e.name instead of e.constructor.name, preserving custom error names set via error.name = "CustomType"
  • Fixed retry logging to only log "Retrying task" after the handler approves the retry

Why is this change needed?

  1. Imperative retry control: Users need fine-grained control over retry decisions that can't be expressed declaratively (e.g., retry based on error message content, external state, or complex conditions)
  2. Error type filtering: The handleFailure predicate enables fail-fast behavior for non-retryable errors (e.g., validation errors, authorization failures)
  3. Parity with .NET SDK: Aligns with the .NET Durable Functions SDK's InvokeWithCustomRetryHandler pattern
  4. Custom error names: Users can now throw errors with error.name = "PermanentError" and filter on that in retry handlers

Issues / work items


Project checklist

  • Release notes are not required for the next release
    • Otherwise: Notes added to CHANGELOG.md
  • Backport is not required
    • Otherwise: Backport tracked by issue/PR #issue_or_pr
  • All required tests have been added/updated (unit tests, E2E tests)
  • Breaking change?
    • If yes:
      • Impact: None - All changes are additive. Existing RetryPolicy usage continues to work unchanged.
      • Migration guidance: N/A

AI-assisted code disclosure (required)

Was an AI tool used? (select one)

  • No
  • Yes, AI helped write parts of this PR (e.g., GitHub Copilot)
  • Yes, an AI agent generated most of this PR

If AI was used:

  • Tool(s): GitHub Copilot
  • AI-assisted areas/files:
    • Bug fix in pb-helper.util.ts (error name detection)
    • Bug fix in orchestration-executor.ts (retry logging order)
  • What you changed after AI output: Reviewed and verified fixes address all failing tests

AI verification (required if AI was used):

  • I understand the code and can explain it
  • I verified referenced APIs/types exist and are correct
  • I reviewed edge cases/failure paths (timeouts, retries, cancellation, exceptions)
  • I reviewed concurrency/async behavior
  • I checked for unintended breaking or behavior changes

Testing

Automated tests

  • Result: Passed

New Unit Tests (7 test files, 100+ test cases):

  • retry-context.spec.ts - RetryContext creation and properties
  • retry-handler.spec.ts - RetryHandler/AsyncRetryHandler types and toAsyncRetryHandler()
  • retry-handler-task.spec.ts - RetryHandlerTask shouldRetry() behavior
  • retry-policy-handle-failure.spec.ts - handleFailure predicate functionality
  • retryable-task.spec.ts - RetryableTask with enhanced policy features
  • task-options-retry-handler.spec.ts - TaskOptions creation helpers
  • orchestration_executor.spec.ts - Updated for new retry scenarios

New E2E Tests (2 test files, 15 test cases):

  • retry-handler.spec.ts - AsyncRetryHandler with activities and sub-orchestrations
  • retry-advanced.spec.ts - handleFailure predicate, timeout, and combined features

Manual validation (only if runtime/behavior changed)

  • Environment (OS, Node.js version, components): Windows, Node.js 22.x, DTS emulator (Docker)
  • Steps + observed results:
    1. npm run build - Successful
    2. npm run lint - Passed
    3. npm run test - All unit tests passed
    4. npx jest test/e2e-azuremanaged/retry-*.spec.ts --runInBand --forceExit - All 15 E2E tests passed
  • Evidence: Error types like TransientError, PermanentError, ValidationError correctly captured and used in retry decisions

Notes for reviewers

Key Design Decisions

  1. RetryTaskBase abstraction: Created a shared base class to avoid code duplication between RetryableTask (policy-based) and RetryHandlerTask (handler-based)

  2. Immediate vs delayed retry:

    • RetryableTask (policy) uses timers for backoff delays
    • RetryHandlerTask (handler) reschedules immediately - delay logic is the handler's responsibility
  3. Handler runs as orchestrator code: Like .NET SDK, the retry handler is subject to replay determinism requirements. RetryContext.orchestrationContext.isReplaying is exposed for guarding side-effects.

  4. Sync/Async handler support: Both sync RetryHandler and async AsyncRetryHandler are supported. Sync handlers are automatically wrapped via toAsyncRetryHandler().

Files Changed Summary

Category Files
New types/APIs retry-handler.ts, retry-context.ts, task-options.ts
Enhanced policy retry-policy.ts, retryable-task.ts
New task class retry-handler-task.ts, retry-task-base.ts
Executor integration orchestration-executor.ts, runtime-orchestration-context.ts
Bug fixes pb-helper.util.ts
Exports index.ts, retry/index.ts, options/index.ts
Unit tests 7 new/updated test files
E2E tests 2 new test files

Copilot AI review requested due to automatic review settings February 6, 2026 01:11
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds selective retry support to the Durable Task JS SDK by introducing a handleFailure predicate to the RetryPolicy class, along with RetryHandler and AsyncRetryHandler types for future custom retry logic. It also fixes a critical bug in error type serialization that prevented error types from being accurately represented in TaskFailureDetails.

Changes:

  • Added handleFailure predicate to RetryPolicy for filtering which errors should be retried (aligned with .NET SDK's RetryPolicy.Handle delegate)
  • Fixed error type serialization to use e.name || e.constructor.name and added this.name to all custom Error classes to ensure accurate error type identification
  • Extended type system to accept RetryHandler and AsyncRetryHandler in TaskOptions (types defined but not yet fully implemented in orchestration executor)

Reviewed changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
packages/durabletask-js/src/task/retry/retry-policy.ts Added handleFailure predicate, FailureHandlerPredicate type, and shouldRetry() method to RetryPolicy
packages/durabletask-js/src/task/retry/retry-context.ts New file: RetryContext interface and factory function for retry handler context
packages/durabletask-js/src/task/retry/retry-handler.ts New file: RetryHandler and AsyncRetryHandler type definitions with wrapper function
packages/durabletask-js/src/task/retryable-task.ts Integrated handleFailure predicate check into computeNextDelayInMilliseconds()
packages/durabletask-js/src/utils/pb-helper.util.ts Fixed error type serialization to use e.name || e.constructor.name
packages/durabletask-js/src/task/exception/*.ts Added this.name assignments to TaskFailedError, OrchestrationStateError
packages/durabletask-js/src/orchestration/exception/*.ts Added this.name assignment to OrchestrationFailedError
packages/durabletask-js/src/exception/timeout-error.ts Added this.name assignment to TimeoutError
packages/durabletask-js/src/task/options/task-options.ts Added TaskRetryOptions union type, helper functions, and type guards
packages/durabletask-js/src/task/failure-details.ts Added TaskFailureDetails interface implemented by FailureDetails class
packages/durabletask-js/src/worker/runtime-orchestration-context.ts Added isRetryPolicy type guard to filter RetryPolicy from RetryHandler options
packages/durabletask-js/src/task/retry/index.ts Exported new retry-related types and functions
packages/durabletask-js/src/task/options/index.ts Exported new task option helper functions and type guards
packages/durabletask-js/src/index.ts Exported all new public API types and functions
test/e2e-azuremanaged/retry-advanced.spec.ts New E2E tests for handleFailure filtering, maxRetryInterval capping, and retryTimeout
packages/durabletask-js/test/*.spec.ts New unit tests for RetryContext, RetryHandler, handleFailure predicate, and task options

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread packages/durabletask-js/src/task/retry/retry-policy.ts Outdated
Comment thread packages/durabletask-js/src/task/retry/retry-handler.ts Outdated
Comment thread packages/durabletask-js/src/task/options/task-options.ts Outdated
Comment thread packages/durabletask-js/src/utils/pb-helper.util.ts Outdated
Comment thread packages/durabletask-js/src/task/retry/retry-context.ts
Comment thread packages/durabletask-js/src/task/retry/retry-policy.ts Outdated
YunchuWang and others added 13 commits February 6, 2026 09:40
- Introduced RetryHandlerTask to support imperative retry control using AsyncRetryHandler.
- Refactored RuntimeOrchestrationContext to utilize createRetryTaskOrDefault for task creation.
- Enhanced retry logic to handle both RetryableTask and RetryHandlerTask.
- Added comprehensive tests for RetryHandlerTask, covering various scenarios including success, failure, and retry logic.
- Updated orchestration executor tests to validate new retry handler functionality.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Member

@torosent torosent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comprehensive Code Review: PR #96 — Retry Handler Support

Overall Assessment

Well-structured PR with good test coverage (~1,400 lines of tests) and excellent JSDoc documentation. The RetryTaskBase extraction is a clean refactoring, and the inheritance hierarchy is logical. However, there are two high-severity issues that should be addressed before merge.

Summary of Findings

Severity Category Issue
High Correctness e.name change is a breaking change for custom Error classes without this.name — other Error subclasses in the codebase aren't updated
High Performance/Design RetryHandlerTask retries are always immediate (no backoff) — could spam the backend
Medium Correctness tryHandleRetry uses sequential if instead of else if for mutually exclusive types
Medium Correctness isCancelled is always false — dead code path in the public API
Medium Correctness Stale action reference after rescheduling
Medium Correctness Sub-orchestration retry reuses the same instance ID
Low Performance Duplicated protobuf-to-plain-object extraction in two classes
Low Performance toAsyncRetryHandler double-wrapping from taskOptionsFromSyncRetryHandler
Low Best Practice No input validation on handler in RetryHandlerTask constructor
Info Memory Circular ref between RetryHandlerTask and context — acceptable for V8
Info Simplicity Helper function proliferation in task-options.ts

What's Good

  • Clean class hierarchy: Task → CompletableTask → RetryTaskBase → RetryableTask|RetryHandlerTask avoids duplication
  • Comprehensive tests: Unit tests cover edge cases (non-retriable failures, exhausted retries, replay-safe context), plus E2E tests
  • Excellent JSDoc: Every public type has @remarks, @example, and @param documentation, including determinism requirements
  • Proper export organization: New exports well-organized through barrel files, implementation details kept internal
  • Correct cleanup: _pendingTasks cleanup during retry is handled correctly, complete() properly clears failure state

Comment thread packages/durabletask-js/src/worker/orchestration-executor.ts
Comment thread packages/durabletask-js/src/task/retry-task-base.ts
Comment thread packages/durabletask-js/src/task/retry-handler-task.ts
Comment thread packages/durabletask-js/src/worker/runtime-orchestration-context.ts
Comment thread packages/durabletask-js/src/utils/pb-helper.util.ts
Comment thread packages/durabletask-js/src/task/options/task-options.ts Outdated
Comment thread packages/durabletask-js/src/task/retry-handler-task.ts
Comment thread packages/durabletask-js/src/worker/runtime-orchestration-context.ts
Comment thread packages/durabletask-js/src/task/options/task-options.ts
Comment thread packages/durabletask-js/src/worker/runtime-orchestration-context.ts
Copy link
Copy Markdown
Member

@torosent torosent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second Review — Post-Update Analysis

All 12 issues from the first review have been addressed — excellent responsiveness! The retry handler delay support (RetryHandlerResult = boolean | number) is well-designed and the code quality has improved significantly.

4 new findings in this follow-up:

# Severity Finding
1 Medium isNonRetriable flag not checked in RetryableTask (only checked in RetryHandlerTask)
2 Medium Version field dropped when rescheduling retry tasks
3 Low Handler returning 0 crashes orchestration instead of graceful handling
4 Low Missing integration test for delay-based retry handler path

Comment thread packages/durabletask-js/src/task/retryable-task.ts
Comment thread packages/durabletask-js/src/worker/runtime-orchestration-context.ts
Comment thread packages/durabletask-js/src/worker/orchestration-executor.ts
Comment thread packages/durabletask-js/test/orchestration_executor.spec.ts
@torosent torosent merged commit ce0c399 into main Feb 7, 2026
10 checks passed
@torosent torosent deleted the wangbill/retryhandler branch February 7, 2026 14:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants